Overview

Dataset statistics

Number of variables21
Number of observations98913
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.2 MiB
Average record size in memory140.0 B

Variable types

Numeric12
Categorical5
Boolean4

Warnings

countryCode has a high cardinality: 199 distinct values High cardinality
seniority is highly correlated with seniorityAsYearsHigh correlation
seniorityAsYears is highly correlated with seniorityHigh correlation
gender is highly correlated with civilityGenderId and 1 other fieldsHigh correlation
civilityGenderId is highly correlated with gender and 1 other fieldsHigh correlation
civilityTitle is highly correlated with gender and 1 other fieldsHigh correlation
socialNbFollowers is highly skewed (γ1 = 88.81691016) Skewed
socialNbFollows is highly skewed (γ1 = 220.8766787) Skewed
socialProductsLiked is highly skewed (γ1 = 244.1577429) Skewed
productsListed is highly skewed (γ1 = 64.89321853) Skewed
productsSold is highly skewed (γ1 = 41.59563253) Skewed
productsWished is highly skewed (γ1 = 49.25695941) Skewed
productsBought is highly skewed (γ1 = 84.79735987) Skewed
identifierHash has unique values Unique
socialProductsLiked has 82987 (83.9%) zeros Zeros
productsListed has 97189 (98.3%) zeros Zeros
productsSold has 96877 (97.9%) zeros Zeros
productsPassRate has 97979 (99.1%) zeros Zeros
productsWished has 89612 (90.6%) zeros Zeros
productsBought has 93494 (94.5%) zeros Zeros

Reproduction

Analysis started2021-02-15 18:14:23.029439
Analysis finished2021-02-15 18:16:22.326457
Duration1 minute and 59.3 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

identifierHash
Real number (ℝ)

UNIQUE

Distinct98913
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-6.692038995 × 1015
Minimum-9.223101126 × 1018
Maximum9.223330728 × 1018
Zeros0
Zeros (%)0.0%
Memory size772.9 KiB
2021-02-15T19:16:22.795065image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-9.223101126 × 1018
5-th percentile-8.30032878 × 1018
Q1-4.622894617 × 1018
median-1.337988846 × 1015
Q34.616388118 × 1018
95-th percentile8.305984346 × 1018
Maximum9.223330728 × 1018
Range-3.122194426 × 1014
Interquartile range (IQR)9.239282735 × 1018

Descriptive statistics

Standard deviation5.33080688 × 1018
Coefficient of variation (CV)-796.5893332
Kurtosis-1.201867217
Mean-6.692038995 × 1015
Median Absolute Deviation (MAD)4.619299754 × 1018
Skewness0.001133563788
Sum2.153133565 × 1018
Variance2.8417502 × 1037
MonotocityNot monotonic
2021-02-15T19:16:23.287037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.86001103 × 10181
 
< 0.1%
-8.302234935 × 10181
 
< 0.1%
-7.914149281 × 10181
 
< 0.1%
6.582480496 × 10181
 
< 0.1%
-8.950562328 × 10181
 
< 0.1%
-8.021904012 × 10181
 
< 0.1%
-4.962304304 × 10181
 
< 0.1%
-4.149872466 × 10181
 
< 0.1%
2.092426645 × 10181
 
< 0.1%
6.002771697 × 10181
 
< 0.1%
Other values (98903)98903
> 99.9%
ValueCountFrequency (%)
-9.223101126 × 10181
< 0.1%
-9.223057731 × 10181
< 0.1%
-9.222867488 × 10181
< 0.1%
-9.222666406 × 10181
< 0.1%
-9.222346324 × 10181
< 0.1%
ValueCountFrequency (%)
9.223330728 × 10181
< 0.1%
9.223304665 × 10181
< 0.1%
9.222858252 × 10181
< 0.1%
9.222779374 × 10181
< 0.1%
9.222469665 × 10181
< 0.1%

language
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size772.9 KiB
en
51564 
fr
26372 
it
7766 
de
7178 
es
6033 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters197826
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowen
2nd rowen
3rd rowfr
4th rowen
5th rowen
ValueCountFrequency (%)
en51564
52.1%
fr26372
26.7%
it7766
 
7.9%
de7178
 
7.3%
es6033
 
6.1%
2021-02-15T19:16:23.906116image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-15T19:16:24.046731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
en51564
52.1%
fr26372
26.7%
it7766
 
7.9%
de7178
 
7.3%
es6033
 
6.1%

Most occurring characters

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter197826
100.0%

Most frequent character per category

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
Latin197826
100.0%

Most frequent character per script

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII197826
100.0%

Most frequent character per block

ValueCountFrequency (%)
e64775
32.7%
n51564
26.1%
f26372
13.3%
r26372
13.3%
i7766
 
3.9%
t7766
 
3.9%
d7178
 
3.6%
s6033
 
3.0%

socialNbFollowers
Real number (ℝ≥0)

SKEWED

Distinct90
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.432268761
Minimum3
Maximum744
Zeros0
Zeros (%)0.0%
Memory size772.9 KiB
2021-02-15T19:16:24.330527image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile3
Q13
median3
Q33
95-th percentile5
Maximum744
Range741
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3.882383028
Coefficient of variation (CV)1.131141906
Kurtosis14415.30703
Mean3.432268761
Median Absolute Deviation (MAD)0
Skewness88.81691016
Sum339496
Variance15.07289798
MonotocityNot monotonic
2021-02-15T19:16:24.706682image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
384939
85.9%
48219
 
8.3%
52720
 
2.7%
6813
 
0.8%
7539
 
0.5%
8336
 
0.3%
9235
 
0.2%
10164
 
0.2%
11121
 
0.1%
1299
 
0.1%
Other values (80)728
 
0.7%
ValueCountFrequency (%)
384939
85.9%
48219
 
8.3%
52720
 
2.7%
6813
 
0.8%
7539
 
0.5%
ValueCountFrequency (%)
7441
< 0.1%
3531
< 0.1%
2051
< 0.1%
1761
< 0.1%
1721
< 0.1%

socialNbFollows
Real number (ℝ≥0)

SKEWED

Distinct85
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.42567711
Minimum0
Maximum13764
Zeros39
Zeros (%)< 0.1%
Memory size772.9 KiB
2021-02-15T19:16:25.074072image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile8
Q18
median8
Q38
95-th percentile8
Maximum13764
Range13764
Interquartile range (IQR)0

Descriptive statistics

Standard deviation52.83957192
Coefficient of variation (CV)6.271255262
Kurtosis52718.3891
Mean8.42567711
Median Absolute Deviation (MAD)0
Skewness220.8766787
Sum833409
Variance2792.02036
MonotocityNot monotonic
2021-02-15T19:16:25.353642image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
894893
95.9%
92386
 
2.4%
10618
 
0.6%
11260
 
0.3%
12148
 
0.1%
1394
 
0.1%
1555
 
0.1%
1453
 
0.1%
752
 
0.1%
039
 
< 0.1%
Other values (75)315
 
0.3%
ValueCountFrequency (%)
039
< 0.1%
15
 
< 0.1%
28
 
< 0.1%
36
 
< 0.1%
411
 
< 0.1%
ValueCountFrequency (%)
137641
< 0.1%
82681
< 0.1%
36491
< 0.1%
20131
< 0.1%
5001
< 0.1%

socialProductsLiked
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct420
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.420743482
Minimum0
Maximum51671
Zeros82987
Zeros (%)83.9%
Memory size772.9 KiB
2021-02-15T19:16:25.742507image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile8
Maximum51671
Range51671
Interquartile range (IQR)0

Descriptive statistics

Standard deviation181.0305695
Coefficient of variation (CV)40.95025423
Kurtosis67765.24122
Mean4.420743482
Median Absolute Deviation (MAD)0
Skewness244.1577429
Sum437269
Variance32772.06708
MonotocityNot monotonic
2021-02-15T19:16:26.191683image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
082987
83.9%
15261
 
5.3%
21898
 
1.9%
31215
 
1.2%
4973
 
1.0%
5644
 
0.7%
6532
 
0.5%
7436
 
0.4%
8359
 
0.4%
9316
 
0.3%
Other values (410)4292
 
4.3%
ValueCountFrequency (%)
082987
83.9%
15261
 
5.3%
21898
 
1.9%
31215
 
1.2%
4973
 
1.0%
ValueCountFrequency (%)
516711
< 0.1%
160401
< 0.1%
70441
< 0.1%
59791
< 0.1%
55981
< 0.1%

productsListed
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct65
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09330421684
Minimum0
Maximum244
Zeros97189
Zeros (%)98.3%
Memory size772.9 KiB
2021-02-15T19:16:26.631373image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum244
Range244
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.050143546
Coefficient of variation (CV)21.97267835
Kurtosis5760.301256
Mean0.09330421684
Median Absolute Deviation (MAD)0
Skewness64.89321853
Sum9229
Variance4.203088557
MonotocityNot monotonic
2021-02-15T19:16:27.025373image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
097189
98.3%
1808
 
0.8%
2278
 
0.3%
3150
 
0.2%
498
 
0.1%
562
 
0.1%
645
 
< 0.1%
740
 
< 0.1%
829
 
< 0.1%
1022
 
< 0.1%
Other values (55)192
 
0.2%
ValueCountFrequency (%)
097189
98.3%
1808
 
0.8%
2278
 
0.3%
3150
 
0.2%
498
 
0.1%
ValueCountFrequency (%)
2441
< 0.1%
2171
< 0.1%
2021
< 0.1%
1851
< 0.1%
1231
< 0.1%

productsSold
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct75
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1215917018
Minimum0
Maximum174
Zeros96877
Zeros (%)97.9%
Memory size772.9 KiB
2021-02-15T19:16:27.353307image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum174
Range174
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.126895354
Coefficient of variation (CV)17.49210943
Kurtosis2355.673441
Mean0.1215917018
Median Absolute Deviation (MAD)0
Skewness41.59563253
Sum12027
Variance4.523683846
MonotocityDecreasing
2021-02-15T19:16:27.659117image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
096877
97.9%
1917
 
0.9%
2325
 
0.3%
3154
 
0.2%
4124
 
0.1%
558
 
0.1%
658
 
0.1%
745
 
< 0.1%
942
 
< 0.1%
831
 
< 0.1%
Other values (65)282
 
0.3%
ValueCountFrequency (%)
096877
97.9%
1917
 
0.9%
2325
 
0.3%
3154
 
0.2%
4124
 
0.1%
ValueCountFrequency (%)
1741
< 0.1%
1701
< 0.1%
1631
< 0.1%
1521
< 0.1%
1251
< 0.1%

productsPassRate
Real number (ℝ≥0)

ZEROS

Distinct72
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8123027307
Minimum0
Maximum100
Zeros97979
Zeros (%)99.1%
Memory size772.9 KiB
2021-02-15T19:16:28.016895image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum100
Range100
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.500205194
Coefficient of variation (CV)10.46433167
Kurtosis114.0391218
Mean0.8123027307
Median Absolute Deviation (MAD)0
Skewness10.66729865
Sum80347.3
Variance72.25348834
MonotocityNot monotonic
2021-02-15T19:16:29.055640image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
097979
99.1%
100441
 
0.4%
6663
 
0.1%
5057
 
0.1%
7542
 
< 0.1%
8325
 
< 0.1%
9025
 
< 0.1%
8022
 
< 0.1%
8520
 
< 0.1%
6016
 
< 0.1%
Other values (62)223
 
0.2%
ValueCountFrequency (%)
097979
99.1%
255
 
< 0.1%
282
 
< 0.1%
311
 
< 0.1%
338
 
< 0.1%
ValueCountFrequency (%)
100441
0.4%
991
 
< 0.1%
98.71
 
< 0.1%
988
 
< 0.1%
96.41
 
< 0.1%

productsWished
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct279
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.562595412
Minimum0
Maximum2635
Zeros89612
Zeros (%)90.6%
Memory size772.9 KiB
2021-02-15T19:16:29.411721image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum2635
Range2635
Interquartile range (IQR)0

Descriptive statistics

Standard deviation25.19279323
Coefficient of variation (CV)16.12240317
Kurtosis3369.163069
Mean1.562595412
Median Absolute Deviation (MAD)0
Skewness49.25695941
Sum154561
Variance634.6768308
MonotocityNot monotonic
2021-02-15T19:16:29.809582image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
089612
90.6%
13375
 
3.4%
21339
 
1.4%
3797
 
0.8%
4526
 
0.5%
5406
 
0.4%
6299
 
0.3%
7252
 
0.3%
8176
 
0.2%
9158
 
0.2%
Other values (269)1973
 
2.0%
ValueCountFrequency (%)
089612
90.6%
13375
 
3.4%
21339
 
1.4%
3797
 
0.8%
4526
 
0.5%
ValueCountFrequency (%)
26351
< 0.1%
19161
< 0.1%
19001
< 0.1%
18421
< 0.1%
18201
< 0.1%

productsBought
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct70
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1719288668
Minimum0
Maximum405
Zeros93494
Zeros (%)94.5%
Memory size772.9 KiB
2021-02-15T19:16:30.184374image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum405
Range405
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.332265666
Coefficient of variation (CV)13.56529424
Kurtosis11871.75975
Mean0.1719288668
Median Absolute Deviation (MAD)0
Skewness84.79735987
Sum17006
Variance5.439463136
MonotocityNot monotonic
2021-02-15T19:16:30.483884image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
093494
94.5%
13297
 
3.3%
2845
 
0.9%
3364
 
0.4%
4214
 
0.2%
5139
 
0.1%
6108
 
0.1%
765
 
0.1%
852
 
0.1%
940
 
< 0.1%
Other values (60)295
 
0.3%
ValueCountFrequency (%)
093494
94.5%
13297
 
3.3%
2845
 
0.9%
3364
 
0.4%
4214
 
0.2%
ValueCountFrequency (%)
4051
< 0.1%
2791
< 0.1%
1741
< 0.1%
1151
< 0.1%
1051
< 0.1%

gender
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size772.9 KiB
F
76121 
M
22792 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters98913
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowF
4th rowF
5th rowF
ValueCountFrequency (%)
F76121
77.0%
M22792
 
23.0%
2021-02-15T19:16:30.985851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-15T19:16:31.142460image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
f76121
77.0%
m22792
 
23.0%

Most occurring characters

ValueCountFrequency (%)
F76121
77.0%
M22792
 
23.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter98913
100.0%

Most frequent character per category

ValueCountFrequency (%)
F76121
77.0%
M22792
 
23.0%

Most occurring scripts

ValueCountFrequency (%)
Latin98913
100.0%

Most frequent character per script

ValueCountFrequency (%)
F76121
77.0%
M22792
 
23.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII98913
100.0%

Most frequent character per block

ValueCountFrequency (%)
F76121
77.0%
M22792
 
23.0%

civilityGenderId
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size772.9 KiB
2
75684 
1
22792 
3
 
437

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters98913
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row2
5th row2
ValueCountFrequency (%)
275684
76.5%
122792
 
23.0%
3437
 
0.4%
2021-02-15T19:16:31.772121image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-15T19:16:32.024120image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
275684
76.5%
122792
 
23.0%
3437
 
0.4%

Most occurring characters

ValueCountFrequency (%)
275684
76.5%
122792
 
23.0%
3437
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number98913
100.0%

Most frequent character per category

ValueCountFrequency (%)
275684
76.5%
122792
 
23.0%
3437
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Common98913
100.0%

Most frequent character per script

ValueCountFrequency (%)
275684
76.5%
122792
 
23.0%
3437
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII98913
100.0%

Most frequent character per block

ValueCountFrequency (%)
275684
76.5%
122792
 
23.0%
3437
 
0.4%

civilityTitle
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size772.9 KiB
mrs
75684 
mr
22792 
miss
 
437

Length

Max length4
Median length3
Mean length2.773993307
Min length2

Characters and Unicode

Total characters274384
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmr
2nd rowmrs
3rd rowmrs
4th rowmrs
5th rowmrs
ValueCountFrequency (%)
mrs75684
76.5%
mr22792
 
23.0%
miss437
 
0.4%
2021-02-15T19:16:32.515333image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-15T19:16:32.769994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
mrs75684
76.5%
mr22792
 
23.0%
miss437
 
0.4%

Most occurring characters

ValueCountFrequency (%)
m98913
36.0%
r98476
35.9%
s76558
27.9%
i437
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter274384
100.0%

Most frequent character per category

ValueCountFrequency (%)
m98913
36.0%
r98476
35.9%
s76558
27.9%
i437
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin274384
100.0%

Most frequent character per script

ValueCountFrequency (%)
m98913
36.0%
r98476
35.9%
s76558
27.9%
i437
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII274384
100.0%

Most frequent character per block

ValueCountFrequency (%)
m98913
36.0%
r98476
35.9%
s76558
27.9%
i437
 
0.2%

hasAnyApp
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size96.7 KiB
False
72739 
True
26174 
ValueCountFrequency (%)
False72739
73.5%
True26174
 
26.5%
2021-02-15T19:16:32.912993image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size96.7 KiB
False
94094 
True
 
4819
ValueCountFrequency (%)
False94094
95.1%
True4819
 
4.9%
2021-02-15T19:16:33.043993image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

hasIosApp
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size96.7 KiB
False
77386 
True
21527 
ValueCountFrequency (%)
False77386
78.2%
True21527
 
21.8%
2021-02-15T19:16:33.170162image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size96.7 KiB
True
97018 
False
 
1895
ValueCountFrequency (%)
True97018
98.1%
False1895
 
1.9%
2021-02-15T19:16:33.261396image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

daysSinceLastLogin
Real number (ℝ≥0)

Distinct699
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean581.2912357
Minimum11
Maximum709
Zeros0
Zeros (%)0.0%
Memory size772.9 KiB
2021-02-15T19:16:33.448834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile43
Q1572
median694
Q3702
95-th percentile708
Maximum709
Range698
Interquartile range (IQR)130

Descriptive statistics

Standard deviation208.8558881
Coefficient of variation (CV)0.3592964684
Kurtosis1.388704906
Mean581.2912357
Median Absolute Deviation (MAD)11
Skewness-1.675425192
Sum57497260
Variance43620.782
MonotocityNot monotonic
2021-02-15T19:16:33.746873image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7023838
 
3.9%
7033792
 
3.8%
6953677
 
3.7%
6963565
 
3.6%
7013516
 
3.6%
7003397
 
3.4%
6933384
 
3.4%
6943368
 
3.4%
7053328
 
3.4%
7043284
 
3.3%
Other values (689)63764
64.5%
ValueCountFrequency (%)
11811
0.8%
12409
0.4%
13344
0.3%
14311
 
0.3%
15235
 
0.2%
ValueCountFrequency (%)
7092910
2.9%
7082857
2.9%
7072797
2.8%
7062643
2.7%
7053328
3.4%

seniority
Real number (ℝ≥0)

HIGH CORRELATION

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3063.77187
Minimum2852
Maximum3205
Zeros0
Zeros (%)0.0%
Memory size772.9 KiB
2021-02-15T19:16:34.034123image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2852
5-th percentile2853
Q12857
median3196
Q33201
95-th percentile3205
Maximum3205
Range353
Interquartile range (IQR)344

Descriptive statistics

Standard deviation168.2986205
Coefficient of variation (CV)0.05493183815
Kurtosis-1.816504427
Mean3063.77187
Median Absolute Deviation (MAD)8
Skewness-0.4270896795
Sum303046867
Variance28324.42566
MonotocityNot monotonic
2021-02-15T19:16:34.406122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
31996366
 
6.4%
31986126
 
6.2%
28575984
 
6.0%
28565945
 
6.0%
31975686
 
5.7%
31965577
 
5.6%
32005496
 
5.6%
32015487
 
5.5%
32055310
 
5.4%
28555267
 
5.3%
Other values (9)41669
42.1%
ValueCountFrequency (%)
28522506
2.5%
28534824
4.9%
28545192
5.2%
28555267
5.3%
28565945
6.0%
ValueCountFrequency (%)
32055310
5.4%
32045070
5.1%
32034921
5.0%
32024622
4.7%
32015487
5.5%

seniorityAsYears
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.510424413
Minimum7.92
Maximum8.9
Zeros0
Zeros (%)0.0%
Memory size772.9 KiB
2021-02-15T19:16:34.790644image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum7.92
5-th percentile7.92
Q17.94
median8.88
Q38.89
95-th percentile8.9
Maximum8.9
Range0.98
Interquartile range (IQR)0.95

Descriptive statistics

Standard deviation0.4678629516
Coefficient of variation (CV)0.05497527842
Kurtosis-1.816316678
Mean8.510424413
Median Absolute Deviation (MAD)0.02
Skewness-0.4273113469
Sum841791.61
Variance0.2188957415
MonotocityNot monotonic
2021-02-15T19:16:35.023609image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
8.8822523
22.8%
8.8921971
22.2%
7.9316404
16.6%
7.9415384
15.6%
8.915301
15.5%
7.927330
 
7.4%
ValueCountFrequency (%)
7.927330
 
7.4%
7.9316404
16.6%
7.9415384
15.6%
8.8822523
22.8%
8.8921971
22.2%
ValueCountFrequency (%)
8.915301
15.5%
8.8921971
22.2%
8.8822523
22.8%
7.9415384
15.6%
7.9316404
16.6%

countryCode
Categorical

HIGH CARDINALITY

Distinct199
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size772.9 KiB
fr
25135 
us
20602 
gb
11310 
it
8015 
de
6567 
Other values (194)
27284 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters197826
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)< 0.1%

Sample

1st rowgb
2nd rowmc
3rd rowfr
4th rowus
5th rowus
ValueCountFrequency (%)
fr25135
25.4%
us20602
20.8%
gb11310
11.4%
it8015
 
8.1%
de6567
 
6.6%
es5706
 
5.8%
au2719
 
2.7%
dk1892
 
1.9%
se1826
 
1.8%
be1666
 
1.7%
Other values (189)13475
13.6%
2021-02-15T19:16:35.670967image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
fr25135
25.4%
us20602
20.8%
gb11310
11.4%
it8015
 
8.1%
de6567
 
6.6%
es5706
 
5.8%
au2719
 
2.7%
dk1892
 
1.9%
se1826
 
1.8%
be1666
 
1.7%
Other values (189)13475
13.6%

Most occurring characters

ValueCountFrequency (%)
s28735
14.5%
r26825
13.6%
f25826
13.1%
u24113
12.2%
e16593
8.4%
b13298
6.7%
g12103
6.1%
i9586
 
4.8%
t9352
 
4.7%
d8670
 
4.4%
Other values (16)22725
11.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter197826
100.0%

Most frequent character per category

ValueCountFrequency (%)
s28735
14.5%
r26825
13.6%
f25826
13.1%
u24113
12.2%
e16593
8.4%
b13298
6.7%
g12103
6.1%
i9586
 
4.8%
t9352
 
4.7%
d8670
 
4.4%
Other values (16)22725
11.5%

Most occurring scripts

ValueCountFrequency (%)
Latin197826
100.0%

Most frequent character per script

ValueCountFrequency (%)
s28735
14.5%
r26825
13.6%
f25826
13.1%
u24113
12.2%
e16593
8.4%
b13298
6.7%
g12103
6.1%
i9586
 
4.8%
t9352
 
4.7%
d8670
 
4.4%
Other values (16)22725
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII197826
100.0%

Most frequent character per block

ValueCountFrequency (%)
s28735
14.5%
r26825
13.6%
f25826
13.1%
u24113
12.2%
e16593
8.4%
b13298
6.7%
g12103
6.1%
i9586
 
4.8%
t9352
 
4.7%
d8670
 
4.4%
Other values (16)22725
11.5%

Interactions

2021-02-15T19:14:58.555287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:14:59.134890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:14:59.688117image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:00.031758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:00.561718image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:01.014644image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:01.376151image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:01.696162image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:02.055381image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:02.414672image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:02.758322image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:03.165785image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:03.810713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:04.211444image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:04.652141image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:05.046139image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:05.480138image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:05.866137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:06.266137image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:06.810135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:07.313135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:07.742127image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:08.085765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:08.470820image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:08.921155image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:09.256950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:09.814973image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:11.367564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:12.425143image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:14.104248image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:15.033247image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:15.560245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:16.103244image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:17.126885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:17.976415image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:18.429393image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:18.890812image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:19.313326image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:19.703712image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:20.102787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:20.514239image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:20.935978image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:21.356587image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:21.865449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:22.422831image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:23.981468image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:24.966622image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:26.383325image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:27.054107image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:27.790132image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:28.439578image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:29.989531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:31.349529image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:32.196527image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:33.218524image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:34.091523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:35.156521image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:35.811519image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:36.809518image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:37.729515image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:38.559514image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:39.291513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:39.993675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:40.612674image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:41.495658image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:42.385657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:43.612452image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:44.430233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:45.394253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:46.187753image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:46.761258image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:47.698915image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:48.853911image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:50.609908image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:51.982962image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:53.087869image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:53.400272image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:53.772773image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:54.190640image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:54.575231image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:54.935968image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:55.247244image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:55.540266image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:56.581949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:57.350047image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:58.049047image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:15:58.720729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:00.263725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:00.770788image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:01.355135image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:01.876604image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:02.415136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:02.866136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:03.322484image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:03.742565image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:04.072612image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:04.422833image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:04.748688image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:05.146690image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:05.532040image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:06.466246image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:06.816726image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:07.149128image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:07.477138image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:07.789589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:08.145328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:08.450149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:08.779854image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:09.167971image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:09.636301image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:10.129361image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:10.488480image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:10.909855image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:11.363986image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:11.703133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:11.999954image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:12.327970image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:12.655984image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:12.999625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:13.343232image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:13.671290image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:13.999303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:14.409009image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:14.771858image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:15.084259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:15.365426image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:15.677819image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:16.054790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:16.508063image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:17.378298image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:17.748296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-15T19:16:18.165296image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-15T19:16:35.956361image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-15T19:16:36.913703image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-15T19:16:37.805125image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-15T19:16:38.684244image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-02-15T19:16:39.308045image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-15T19:16:19.047712image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-15T19:16:20.655240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

identifierHashlanguagesocialNbFollowerssocialNbFollowssocialProductsLikedproductsListedproductsSoldproductsPassRateproductsWishedproductsBoughtgendercivilityGenderIdcivilityTitlehasAnyApphasAndroidApphasIosApphasProfilePicturedaysSinceLastLoginseniorityseniorityAsYearscountryCode
0-1097895247965112460en14710772617474.01041M1mrTrueFalseTrueTrue1131968.88gb
12347567364561867620en167821917099.000F2mrsTrueFalseTrueTrue1232048.90mc
26870940546848049750fr13713603316394.0103F2mrsTrueFalseTrueFalse1132038.90fr
3-4640272621319568052en131101412215292.070F2mrsTrueFalseTrueFalse1231988.88us
4-5175830994878542658en1678025125100.000F2mrsFalseFalseFalseTrue2228547.93us
57631788075812383072de1301214712391.000F2mrsTrueFalseTrueFalse1131968.88de
6674361423306028463en121011403110894.0531105F3missTrueTrueFalseFalse1131988.88se
72550976450216757005fr5393510698.000F2mrsTrueFalseTrueTrue1128577.94fr
83718185418791028367it7441376451671010485.018420F2mrsTrueFalseTrueFalse1431958.88it
93908244093584862523en578451239274.062F3missTrueFalseTrueTrue1128567.93gb

Last rows

identifierHashlanguagesocialNbFollowerssocialNbFollowssocialProductsLikedproductsListedproductsSoldproductsPassRateproductsWishedproductsBoughtgendercivilityGenderIdcivilityTitlehasAnyApphasAndroidApphasIosApphasProfilePicturedaysSinceLastLoginseniorityseniorityAsYearscountryCode
98903-2219367748414812248es380000.000F2mrsTrueFalseTrueTrue11232048.9es
989042896867688384676348en380000.000F2mrsFalseFalseFalseTrue70832048.9gb
989053164321379397826945en386000.000F2mrsFalseFalseFalseTrue65532048.9us
98906-3379431417039360607en380000.000F2mrsFalseFalseFalseTrue70832048.9ie
98907-5212100190867739388en380000.000F2mrsFalseFalseFalseTrue70832048.9us
98908-5324380437900495747fr380000.000M1mrFalseFalseFalseTrue70832048.9us
98909-5607668753771114442fr380000.000M1mrTrueFalseTrueTrue69532048.9fr
98910350630276238833248en380000.000M1mrTrueTrueFalseTrue52032048.9be
989112006580738726207028it380000.000F2mrsFalseFalseFalseTrue26732048.9it
98912-7621316584087253691fr380000.000M1mrTrueFalseTrueTrue56132048.9gn